Building a Consistent 3D Representation of a Mobile Robot Environment by Combining Multiple Stereo Views
نویسندگان
چکیده
In this short article, we report new results on our work on the problem of using passive Vision and more precisely Stereo Vision to build up consistent 3D geometric descriptions of the environment of a mobile robot I-INTRODUCTION The robot that we have built consists of a four-wheeled platform with two driving wheels operated by electrical motors. A set of three CCD cameras provides black and white images of the environment The cameras are located at the vertexes of a vertical roughly equilateral triangle. Images are transmitted via a VHF link to a workstation where they are stored and made available through Ethernet to a number of processors. To the user, the vehicle appears as a standard peripheral and can be accessed as such from any terminal on the net It is therefore a very convenient testbed for studying a number of problems in Vision. One such problem is the following. Suppose we let our vehicle wander around in a building using its ultrasound sensors to avoid obstacles, odometry to roughly estimate its motion and its three cameras to compute 3D descriptions of its environment One question then is, can we hope to combine coherently the various sources of information, and especially the visual information obtained at different times and from different places, and build up an accurate geometric 3D representation of the building even if each individual measurement is itself fairly inaccurate 7 We call this problem the Visual Fusion problem. There are two deep issues which are associated with this question. First is the issue of the type of geometric representation that is used by the system. Representations which are mathematically equivalent may behave quite differently on a real problem due to the unavoidable presence of noise and errors. This brings up the second issue which is the question of how do we represent and manipulate uncertainty. In the next Sections we propose a solution to these issues and present some results. ll WHAT IS THE PROBLEM THAT WE ARE TRYING TO SOL VE ? Each triplet of images provided by the three cameras is analysed by a Stereo program described in [3,4]. This program outputs 3D line segments described in a coordinate system attached to the three cameras. Each line segment has a geometric description which we elaborate on in the next Section and an uncertainty which we explain in Section IV. This uncertainty is directly related to the limited resolution and the geometry of the three cameras. To relate the various coordinate systems corresponding to the different viewpoints we estimate the rigid motions between them. This is done in two steps. First a rough estimate is obtained by combining the odometry with the rotation of the cameras. Second, a better estimate is obtained by combining the two 3D representations provided by the Stereo program in the two positions of the vehicle. This is done by matching 3D segments which are present in the two views and is described in details in [5,1]. The result is an estimate of the rotation matrix and translation vector between the coordinate systems attached to the cameras in their respective positions, together with some measure of their uncertainty (to be explained in Section IV). This having been completed, the current representation of the 1: This work is supported by the ESPRIT project P940. environment is a number of uncertain geometric primitives (here 3D line segments) attached to coordinate frames related by uncertain rigid motions. The more we move the robot and measure, the move we increase the number of line segments until we run out of memory. This is clearly unsatisfactory and we must provide the system means of "forgetting intelligently". By this we mean the following. Let us consider a physical line segment S like a part of the frame of a window, or the edge of a desk. This line segment is very likely to have been detected in different positions 1, 2, ... , n of the mobile robot and is therefore present as segment Sj in position 1, segment S2 in position 2, ... , segment Sn in position n. Since we can relate by rigid motions position 1, to positions 2, 3 n, by applying the right transformation, the physical segment S is represented by n segments S j , S'2 , . . . , S'n in the coordinate system attached to position 1. We would like our system to have the capability of automatically deciding that S1, S2 ... ♦ and S'n are the same segment S and fusing them into one segment S , a combination of S j , S2, ... , S'n. This has two advantages. First, S being a combination of n sources of information should be at least as accurate as each of its instanciations, therefore accuracy in the description is now able to increase, and second since the system has recognized that S j , S 2 , . . . , and Sn are the same segment S , it can forget diem and remember only S, it is now able to "forget intelligently". In order to achieve this goal, two questions must be answered. How do we represent and manipulate geometry and uncertainty ? HI REPRESENTING LINES AND LINE SEGMENTS The obvious way to represent a line is by choosing two points on it, or one point and a direction. The first representation has dimension 6 (the six coordinates of the two points), the second representation has dimension 5 (the three coordinates of the point and the two coordinates defining die direction as, for example a unit vector on the gaussian sphere). In fact, the minimal dimension of the representation of a line is four. This can be seen by choosing, in the second representation, the point such mat the segment from the origin to the point is perpendicular to the line. The line is then located in the plane normal to that segment and can be determined by its orientation with respect to a known direction in that plane, i.e. by one parameter. A line segment is six-dimensional, being represented either by its two endpoints or by one endpoint (3 parameters), the line direction (2 parameters), and its lengm (1 parameter). For our problem, even though we actually manipulate line segments because this is what is provided by our stereo algorithms, what we in fact would like to fuse are lines. The reason for this is the fact mat segmentation errors, variations of illumination, inadequate edge detectors, and variations of viewpoints result in the same physical segments being instantiated as a variety of subsegments. Since we do not know the real segment we must deal with its supporting line. A convenient minimal (i.e. four-dimensional) representation of 3D lines is the (a, b, p, q) representation where the line is defined by the two planes: x • az + p y bz + q (1) This representation is easily computed using the coordinates of two points on the line. The effect of a rigid motion defined by a rotation matrix R and a translation vector t can be readily assessed, i.e. if we route and translate a
منابع مشابه
Building 3d Maps with Semantic Elements Integrating 2d Laser, Stereo Vision and Imu on a Mobile Robot
Building 3D models is important in many applications, ranging from virtual visits of historical buildings, game and entertainment, to risk analysis in partially collapsed buildings. This task is performed at different scales: city, buildings, indoor environments, objects and using different sensors: cameras, 2D and 3D laser, etc. Moreover, different map representation have been considered: metr...
متن کاملStereo Vision Tracking of Multiple Objects in Complex Indoor Environments
This paper presents a novel system capable of solving the problem of tracking multiple targets in a crowded, complex and dynamic indoor environment, like those typical of mobile robot applications. The proposed solution is based on a stereo vision set in the acquisition step and a probabilistic algorithm in the obstacles position estimation process. The system obtains 3D position and speed info...
متن کاملTVL _1 Planarity Regularization for 3D Shape Approximation
The modern emergence of automation in many industries has given impetus to extensive research into mobile robotics. Novel perception technologies now enable cars to drive autonomously, tractors to till a field automatically and underwater robots to construct pipelines. An essential requirement to facilitate both perception and autonomous navigation is the analysis of the 3D environment using se...
متن کاملVision-based Mobile Robot Localization And Mapping using Scale-Invariant Features
A key component of a mobile robot system is the ability to localize itself accurately and build a map of the environment simultaneously. In this paper, a vision-based mobile robot localization and mapping algorithm is described which uses scale-invariant image features as landmarks in unmodified dynamic environments. These 3D landmarks are localized and robot ego-motion is estimated by matching...
متن کاملView-based maps
Robotic systems that can create and use visual maps in real-time have obvious advantages in many applications, from automatic driving to mobile manipulation in the home. In this paper we describe a mapping system based on retaining stereo views of the environment that are collected as the robot moves. Connections among the views are formed by consistent geometric matching of their features. Out...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1987